#ForeCast report

Team:CaiXiaoYi DingAoDi DuNaiHe LiaoNing WangLin

Research

The analysis of temporal data is an important issue in current research, because most real-world data either explicitly or implicitly contain some information about time.
In the current forecast data, there are three sheets, namely mass, food and drug.

in the treatment of data.

We have not predicted them together.because they have some differences in certain realistic factors.

The context of this data is the United States, so I learned about the differences between American consumption habits and mass, food and drug.

For example, FOOD sells better because in the US, people like to go to FOOD after work and school to buy fresh pies, cakes and other fast food. FMCG products need to be bought fresh and not stocked up on too much food.


And mass, although the supermarket is the largest, it doesn't sell more than food, this is because mass is suitable for family level to go shopping, it may be a wholesale market, it may have a scale cost advantage compared to DRUG, so it sells in the mid range and people like weekends. The whole family drives to buy food for a week or so and fresh pie has a limited expiry date so it doesn't people like to drive to buy food for a week or so on the weekend and raw pie has a limited expiry date so they don't buy too much raw pie at a time. And DRUG, like the small supermarkets near home, has a single flavour of pie, so it has the lowest sales.


Therefore, we did not combine the data from these three size classes of supermarkets in this forecast.

But this forecast has certain limitations**, because the yearly data is particularly small and it has missing months, for example in 04 09 it only has half yearly data. Here, in the traditional model (time regression), we only forecast the annual total by monthly averages and then use these five totals to forecast the annual trend. This is because it is inherently difficult to predict three points from five points, as only the trend is predicted in the annual data.

In terms of model use

For this prediction task, our team, not only used the traditional models, stl, ETS,HW, time regression, but also some novel models, such as auto_arima_xgboost, randomForest,earth, prophet_xgboost, stlm_ets, stlm_ arima,prophet,And so on, a dozen models. some of them are traditional models combined with machine learning algorithms to good effect, and these are the models that people like to use in the kaggle competition.

The report is exported in HTML format because this way plotly can be used and the results can be viewed interactively.

code

Data pre-processing

# author:Cai
# rm(list = ls())
get_data_by_no <- function(df, cnum){
  first_cnum = 1 + cnum
  second_cnum = 73 + cnum

  res <- data.frame(df[1], df[first_cnum], df[second_cnum])
  #test
  
  
  res <- res[2:nrow(res),]
  names(res) <- c("Month", "VOLUMN", "PRICE")
  return (res)
  
}

mass <- read.xlsx("/Users/wangzuxian/Data_for_test1.xlsx",sheet = 1)
food <- read.xlsx("/Users/wangzuxian/Data_for_test1.xlsx",sheet = 2)
drug <- read.xlsx("/Users/wangzuxian/Data_for_test1.xlsx",sheet = 3)

cnum = 7
mass <- get_data_by_no(mass, cnum)
food <- get_data_by_no(food, cnum)
drug <- get_data_by_no(drug, cnum)
all <- rbind(mass, food)

mass
##       Month             VOLUMN               PRICE
## 2  2004_Jul 91942.571429999996 0.54006200000000004
## 3  2004_Aug 80327.285709999996 0.64228099999999999
## 4  2004_Sep 67387.857139999993 0.98359300000000005
## 5  2004_Oct 81307.071429999996  1.3260529999999999
## 6  2004_Nov 172584.28570000001  2.1541800000000002
## 7  2004_Dec 116803.64290000001  1.8618030000000001
## 8  2005_Jan 67193.464290000004            1.111264
## 9  2005_Feb 65111.321430000004            1.109623
## 10 2005_Mar 81389.357139999993            1.255217
## 11 2005_Apr 90105.428570000004 0.96763299999999997
## 12 2005_May 136057.89290000001 0.78552500000000003
## 13 2005_Jun          107802.75 0.90408299999999997
## 14 2005_Jul 114619.78569999999 0.85580400000000001
## 15 2005_Aug 152440.71429999999 0.79672600000000005
## 16 2005_Sep 168344.78570000001 0.75808799999999998
## 17 2005_Oct 127759.21430000001 0.99165000000000003
## 18 2005_Nov 143462.07139999999  2.0077929999999999
## 19 2005_Dec 122154.53569999999            1.797272
## 20 2006_Jan           88740.25            1.008996
## 21 2006_Feb 87689.428570000004            1.055606
## 22 2006_Mar          123369.75 0.99135399999999996
## 23 2006_Apr 155312.03570000001 0.85621400000000003
## 24 2006_May 140064.92860000001 0.85486399999999996
## 25 2006_Jun 135176.78570000001 0.95822399999999996
## 26 2006_Jul 178455.57139999999 0.72394099999999995
## 27 2006_Aug 216618.21429999999 0.77141300000000002
## 28 2006_Sep 198734.92860000001 0.83291000000000004
## 29 2006_Oct 215120.07139999999 0.95692200000000005
## 30 2006_Nov 261636.85709999999  1.9120539999999999
## 31 2006_Dec 214985.07139999999            1.834578
## 32 2007_Jan 149417.78570000001 0.98325700000000005
## 33 2007_Feb 141932.71429999999 0.89841199999999999
## 34 2007_Mar 174709.42860000001 0.87739900000000004
## 35 2007_Apr             169455 0.94608199999999998
## 36 2007_May 168988.82139999999 0.94844300000000004
## 37 2007_Jun 162072.32139999999            1.191935
## 38 2007_Jul 137077.07139999999            1.318379
## 39 2007_Aug 167178.39290000001  1.0474460000000001
## 40 2007_Sep 181669.89290000001  1.0151950000000001
## 41 2007_Oct 165415.42860000001  1.3144169999999999
## 42 2007_Nov 173307.92860000001  2.3762050000000001
## 43 2007_Dec 126072.78569999999  2.1934399999999998
## 44 2008_Jan 105059.14290000001  1.4437770000000001
## 45 2008_Feb 149494.57139999999  1.0774969999999999
## 46 2008_Mar        103524.5714  1.7223170000000001
## 47 2008_Apr 174153.64290000001  1.1047309999999999
## 48 2008_May             197059            1.033892
## 49 2008_Jun        130458.9286  1.4606319999999999
## 50 2008_Jul 133018.07139999999  1.3096939999999999
## 51 2008_Aug 146845.21429999999            1.248834
## 52 2008_Sep 95496.964290000004  1.5976870000000001
## 53 2008_Oct 78102.107139999993            1.838252
## 54 2008_Nov 215058.28570000001  1.9767680000000001
## 55 2008_Dec 145347.67860000001            1.793644
## 56 2009_Jan 79226.178570000004  1.6557230000000001
## 57 2009_Feb        53561.14286  1.7516210000000001
## 58 2009_Mar 68385.392860000007  1.8233410000000001
## 59 2009_Apr 87508.821429999996             1.69678
## 60 2009_May 87781.785709999996  1.6274189999999999
## 61 2009_Jun        107918.5714            1.477811
food
##       Month             VOLUMN              PRICE
## 2  2004_Jul 14580047.289999999 1.4920530000000001
## 3  2004_Aug 14809137.640000001 1.4252750000000001
## 4  2004_Sep        15864933.93           1.416803
## 5  2004_Oct 17125285.890000001 1.5655950000000001
## 6  2004_Nov        20660632.68 2.4059710000000001
## 7  2004_Dec 15922815.859999999 2.3659119999999998
## 8  2005_Jan 12963245.289999999 1.4825759999999999
## 9  2005_Feb           12795191 1.3988100000000001
## 10 2005_Mar 14373304.289999999 1.4715279999999999
## 11 2005_Apr        13649767.43            1.39777
## 12 2005_May 13877404.890000001           1.423743
## 13 2005_Jun        13311289.32           1.508758
## 14 2005_Jul 13453650.640000001           1.475236
## 15 2005_Aug 14053982.539999999           1.383799
## 16 2005_Sep 15920217.109999999 1.3034220000000001
## 17 2005_Oct 16409994.359999999           1.581861
## 18 2005_Nov        21044427.93 2.4530940000000001
## 19 2005_Dec 16387731.859999999 2.3168510000000002
## 20 2006_Jan        13403199.93 1.4844440000000001
## 21 2006_Feb        13562727.57 1.3823460000000001
## 22 2006_Mar 15348003.859999999           1.402471
## 23 2006_Apr        14298922.93 1.5179940000000001
## 24 2006_May 13635635.359999999 1.5351300000000001
## 25 2006_Jun 13384881.710000001           1.626436
## 26 2006_Jul 13270425.960000001           1.632979
## 27 2006_Aug        13210034.32           1.661734
## 28 2006_Sep        13710296.43 1.6706650000000001
## 29 2006_Oct 14707820.039999999           1.877999
## 30 2006_Nov        19268178.75           2.801139
## 31 2006_Dec 14473257.289999999 2.5505209999999998
## 32 2007_Jan        11343783.25           1.807471
## 33 2007_Feb        11147825.57 1.6887110000000001
## 34 2007_Mar 12386796.390000001 1.7166399999999999
## 35 2007_Apr 11951512.859999999 1.7857419999999999
## 36 2007_May        12634817.93 1.6813020000000001
## 37 2007_Jun        12676693.93           1.738186
## 38 2007_Jul 12036229.210000001 1.7763720000000001
## 39 2007_Aug        12535093.57 1.7050259999999999
## 40 2007_Sep        13214116.57 1.6688559999999999
## 41 2007_Oct        14616719.57 1.8079209999999999
## 42 2007_Nov 18908816.359999999 2.8379569999999998
## 43 2007_Dec        13255560.32 2.5543979999999999
## 44 2008_Jan 11116518.890000001           1.852541
## 45 2008_Feb 10703282.289999999 1.8026260000000001
## 46 2008_Mar 11773156.710000001 1.8430930000000001
## 47 2008_Apr 11475395.789999999           1.776068
## 48 2008_May 11807544.039999999           1.835863
## 49 2008_Jun 11501570.609999999           1.928809
## 50 2008_Jul 11730316.789999999 1.8281970000000001
## 51 2008_Aug 11499208.710000001           1.869837
## 52 2008_Sep        11687784.82 1.8807499999999999
## 53 2008_Oct 12939811.039999999           2.137677
## 54 2008_Nov 17506532.710000001           3.144949
## 55 2008_Dec 12890285.640000001 2.8854950000000001
## 56 2009_Jan 10582184.210000001 2.1201300000000001
## 57 2009_Feb 10634505.289999999           1.839275
## 58 2009_Mar        13037013.57 1.8388500000000001
## 59 2009_Apr 12569873.039999999 1.8409899999999999
## 60 2009_May 12705934.109999999           1.802271
## 61 2009_Jun 12323964.640000001 1.9154990000000001
drug
##       Month             VOLUMN               PRICE
## 2  2004_Jul 342038.89289999998 0.91586999999999996
## 3  2004_Aug 353867.96429999999 0.87870499999999996
## 4  2004_Sep             353385 0.89099899999999999
## 5  2004_Oct 382660.53570000001 0.91813599999999995
## 6  2004_Nov 396199.82140000002  1.1610769999999999
## 7  2004_Dec 374950.21429999999  1.0878859999999999
## 8  2005_Jan 373119.71429999999 0.90710500000000005
## 9  2005_Feb          363302.75 0.95036500000000002
## 10 2005_Mar 373210.10710000002 0.91667500000000002
## 11 2005_Apr 356498.57140000002 0.89837999999999996
## 12 2005_May 373107.92859999998 0.88058899999999996
## 13 2005_Jun 316255.67859999998 0.88217500000000004
## 14 2005_Jul 309310.39289999998 0.90456899999999996
## 15 2005_Aug          318453.25 0.91396599999999995
## 16 2005_Sep 322646.35710000002            0.887374
## 17 2005_Oct 362755.67859999998 0.96897800000000001
## 18 2005_Nov 376950.71429999999  1.1442209999999999
## 19 2005_Dec 351865.07140000002  1.0549839999999999
## 20 2006_Jan 338764.78570000001 0.98413700000000004
## 21 2006_Feb 325562.71429999999 0.96184199999999997
## 22 2006_Mar 366729.03570000001 0.92859899999999995
## 23 2006_Apr 353113.17859999998 0.90074399999999999
## 24 2006_May 355834.21429999999 0.91462399999999999
## 25 2006_Jun 328466.85710000002 0.91182300000000005
## 26 2006_Jul             317527 0.95529200000000003
## 27 2006_Aug 330962.42859999998 0.96611800000000003
## 28 2006_Sep 345281.64289999998 0.96225899999999998
## 29 2006_Oct 344282.39289999998  1.0801959999999999
## 30 2006_Nov 339512.89289999998            1.278718
## 31 2006_Dec 315622.14289999998  1.1533739999999999
## 32 2007_Jan 303314.64289999998  1.0544720000000001
## 33 2007_Feb 307557.35710000002  1.0378909999999999
## 34 2007_Mar 343296.21429999999             1.02915
## 35 2007_Apr 337749.64289999998  1.0184660000000001
## 36 2007_May 349303.28570000001  1.0323960000000001
## 37 2007_Jun 319853.03570000001            1.039404
## 38 2007_Jul          308596.25  1.0461009999999999
## 39 2007_Aug 312512.32140000002  1.0452030000000001
## 40 2007_Sep             336597            1.035852
## 41 2007_Oct          364338.75  1.0903080000000001
## 42 2007_Nov             362315             1.28379
## 43 2007_Dec          342889.25  1.1375189999999999
## 44 2008_Jan 339879.89289999998  1.1090450000000001
## 45 2008_Feb 355203.60710000002            1.048227
## 46 2008_Mar 406840.39289999998 0.99135200000000001
## 47 2008_Apr 332608.28570000001            1.087043
## 48 2008_May 350392.21429999999            1.101599
## 49 2008_Jun 314237.82140000002            1.086986
## 50 2008_Jul 310671.96429999999            1.069949
## 51 2008_Aug          301160.75  1.1224879999999999
## 52 2008_Sep           320567.5  1.1311899999999999
## 53 2008_Oct 324321.32140000002            1.210744
## 54 2008_Nov 328502.64289999998  1.5387139999999999
## 55 2008_Dec 321003.64289999998            1.317642
## 56 2009_Jan 307358.85710000002  1.2528520000000001
## 57 2009_Feb 288944.57140000002  1.2352019999999999
## 58 2009_Mar 325536.89289999998            1.226621
## 59 2009_Apr 316638.39289999998             1.16137
## 60 2009_May 299196.14289999998  1.1962550000000001
## 61 2009_Jun 274674.64289999998  1.2098329999999999
all
##        Month             VOLUMN               PRICE
## 2   2004_Jul 91942.571429999996 0.54006200000000004
## 3   2004_Aug 80327.285709999996 0.64228099999999999
## 4   2004_Sep 67387.857139999993 0.98359300000000005
## 5   2004_Oct 81307.071429999996  1.3260529999999999
## 6   2004_Nov 172584.28570000001  2.1541800000000002
## 7   2004_Dec 116803.64290000001  1.8618030000000001
## 8   2005_Jan 67193.464290000004            1.111264
## 9   2005_Feb 65111.321430000004            1.109623
## 10  2005_Mar 81389.357139999993            1.255217
## 11  2005_Apr 90105.428570000004 0.96763299999999997
## 12  2005_May 136057.89290000001 0.78552500000000003
## 13  2005_Jun          107802.75 0.90408299999999997
## 14  2005_Jul 114619.78569999999 0.85580400000000001
## 15  2005_Aug 152440.71429999999 0.79672600000000005
## 16  2005_Sep 168344.78570000001 0.75808799999999998
## 17  2005_Oct 127759.21430000001 0.99165000000000003
## 18  2005_Nov 143462.07139999999  2.0077929999999999
## 19  2005_Dec 122154.53569999999            1.797272
## 20  2006_Jan           88740.25            1.008996
## 21  2006_Feb 87689.428570000004            1.055606
## 22  2006_Mar          123369.75 0.99135399999999996
## 23  2006_Apr 155312.03570000001 0.85621400000000003
## 24  2006_May 140064.92860000001 0.85486399999999996
## 25  2006_Jun 135176.78570000001 0.95822399999999996
## 26  2006_Jul 178455.57139999999 0.72394099999999995
## 27  2006_Aug 216618.21429999999 0.77141300000000002
## 28  2006_Sep 198734.92860000001 0.83291000000000004
## 29  2006_Oct 215120.07139999999 0.95692200000000005
## 30  2006_Nov 261636.85709999999  1.9120539999999999
## 31  2006_Dec 214985.07139999999            1.834578
## 32  2007_Jan 149417.78570000001 0.98325700000000005
## 33  2007_Feb 141932.71429999999 0.89841199999999999
## 34  2007_Mar 174709.42860000001 0.87739900000000004
## 35  2007_Apr             169455 0.94608199999999998
## 36  2007_May 168988.82139999999 0.94844300000000004
## 37  2007_Jun 162072.32139999999            1.191935
## 38  2007_Jul 137077.07139999999            1.318379
## 39  2007_Aug 167178.39290000001  1.0474460000000001
## 40  2007_Sep 181669.89290000001  1.0151950000000001
## 41  2007_Oct 165415.42860000001  1.3144169999999999
## 42  2007_Nov 173307.92860000001  2.3762050000000001
## 43  2007_Dec 126072.78569999999  2.1934399999999998
## 44  2008_Jan 105059.14290000001  1.4437770000000001
## 45  2008_Feb 149494.57139999999  1.0774969999999999
## 46  2008_Mar        103524.5714  1.7223170000000001
## 47  2008_Apr 174153.64290000001  1.1047309999999999
## 48  2008_May             197059            1.033892
## 49  2008_Jun        130458.9286  1.4606319999999999
## 50  2008_Jul 133018.07139999999  1.3096939999999999
## 51  2008_Aug 146845.21429999999            1.248834
## 52  2008_Sep 95496.964290000004  1.5976870000000001
## 53  2008_Oct 78102.107139999993            1.838252
## 54  2008_Nov 215058.28570000001  1.9767680000000001
## 55  2008_Dec 145347.67860000001            1.793644
## 56  2009_Jan 79226.178570000004  1.6557230000000001
## 57  2009_Feb        53561.14286  1.7516210000000001
## 58  2009_Mar 68385.392860000007  1.8233410000000001
## 59  2009_Apr 87508.821429999996             1.69678
## 60  2009_May 87781.785709999996  1.6274189999999999
## 61  2009_Jun        107918.5714            1.477811
## 210 2004_Jul 14580047.289999999  1.4920530000000001
## 310 2004_Aug 14809137.640000001  1.4252750000000001
## 410 2004_Sep        15864933.93            1.416803
## 510 2004_Oct 17125285.890000001  1.5655950000000001
## 62  2004_Nov        20660632.68  2.4059710000000001
## 71  2004_Dec 15922815.859999999  2.3659119999999998
## 81  2005_Jan 12963245.289999999  1.4825759999999999
## 91  2005_Feb           12795191  1.3988100000000001
## 101 2005_Mar 14373304.289999999  1.4715279999999999
## 111 2005_Apr        13649767.43             1.39777
## 121 2005_May 13877404.890000001            1.423743
## 131 2005_Jun        13311289.32            1.508758
## 141 2005_Jul 13453650.640000001            1.475236
## 151 2005_Aug 14053982.539999999            1.383799
## 161 2005_Sep 15920217.109999999  1.3034220000000001
## 171 2005_Oct 16409994.359999999            1.581861
## 181 2005_Nov        21044427.93  2.4530940000000001
## 191 2005_Dec 16387731.859999999  2.3168510000000002
## 201 2006_Jan        13403199.93  1.4844440000000001
## 211 2006_Feb        13562727.57  1.3823460000000001
## 221 2006_Mar 15348003.859999999            1.402471
## 231 2006_Apr        14298922.93  1.5179940000000001
## 241 2006_May 13635635.359999999  1.5351300000000001
## 251 2006_Jun 13384881.710000001            1.626436
## 261 2006_Jul 13270425.960000001            1.632979
## 271 2006_Aug        13210034.32            1.661734
## 281 2006_Sep        13710296.43  1.6706650000000001
## 291 2006_Oct 14707820.039999999            1.877999
## 301 2006_Nov        19268178.75            2.801139
## 311 2006_Dec 14473257.289999999  2.5505209999999998
## 321 2007_Jan        11343783.25            1.807471
## 331 2007_Feb        11147825.57  1.6887110000000001
## 341 2007_Mar 12386796.390000001  1.7166399999999999
## 351 2007_Apr 11951512.859999999  1.7857419999999999
## 361 2007_May        12634817.93  1.6813020000000001
## 371 2007_Jun        12676693.93            1.738186
## 381 2007_Jul 12036229.210000001  1.7763720000000001
## 391 2007_Aug        12535093.57  1.7050259999999999
## 401 2007_Sep        13214116.57  1.6688559999999999
## 411 2007_Oct        14616719.57  1.8079209999999999
## 421 2007_Nov 18908816.359999999  2.8379569999999998
## 431 2007_Dec        13255560.32  2.5543979999999999
## 441 2008_Jan 11116518.890000001            1.852541
## 451 2008_Feb 10703282.289999999  1.8026260000000001
## 461 2008_Mar 11773156.710000001  1.8430930000000001
## 471 2008_Apr 11475395.789999999            1.776068
## 481 2008_May 11807544.039999999            1.835863
## 491 2008_Jun 11501570.609999999            1.928809
## 501 2008_Jul 11730316.789999999  1.8281970000000001
## 511 2008_Aug 11499208.710000001            1.869837
## 521 2008_Sep        11687784.82  1.8807499999999999
## 531 2008_Oct 12939811.039999999            2.137677
## 541 2008_Nov 17506532.710000001            3.144949
## 551 2008_Dec 12890285.640000001  2.8854950000000001
## 561 2009_Jan 10582184.210000001  2.1201300000000001
## 571 2009_Feb 10634505.289999999            1.839275
## 581 2009_Mar        13037013.57  1.8388500000000001
## 591 2009_Apr 12569873.039999999  1.8409899999999999
## 601 2009_May 12705934.109999999            1.802271
## 611 2009_Jun 12323964.640000001  1.9154990000000001

Aggregate data c.for volume (sum)

#author :CaiVOL_CORN CHIPS
mass$VOLUMN <- as.numeric(mass$VOLUMN)
mass$PRICE <- as.numeric(mass$PRICE)
food$VOLUMN <- as.numeric(food$VOLUMN)
food$PRICE <- as.numeric(food$PRICE)
drug$VOLUMN <- as.numeric(drug$VOLUMN)
drug$PRICE <- as.numeric(drug$PRICE)
all$VOLUMN <- as.numeric(all$VOLUMN)
all$PRICE <- as.numeric(all$PRICE)
mass$weight_sum = as.numeric(mass$VOLUMN) * as.numeric(mass$PRICE)
food$weight_sum = as.numeric(food$VOLUMN)*as.numeric(food$PRICE)
drug$weight_sum = as.numeric(drug$VOLUMN)*as.numeric(drug$PRICE)
all$weight_sum = as.numeric(all$VOLUMN)*as.numeric(all$PRICE)

1.Aggregate data at

a.quarter b.year d.for price (weighted mean) 2.Calculate turnover (price x volume)

#author :Cai
quarter_data = function(df){
  Sys.setlocale('LC_TIME', 'C')
  month <- df$Month
  month <- str_c('1_',month)
  month <- as.Date(month,format='%d_%Y_%b')
  
  quarter <- str_c(year(month),'-',quarters(month))
  
  df$quarter <- quarter
  
  # weight_sum is Turnover

  
  # data month
  df_month <- data.frame(df$Month,df$VOLUMN,df$VOLUMN,df$weight_sum)
  
  # Quarterly data
  vol_sum <- aggregate(df$VOLUMN, by=list(type=df$quarter),sum)
  vol_sum
  weight_sum <- aggregate(df$weight_sum, by=list(type=df$quarter),sum)
  
  df_quarter = data.frame(quarter=vol_sum$type,
                          vol_sum=vol_sum$x,
                          weight_sum=weight_sum$x)
  df_quarter$weight_mean <- df_quarter$weight_sum/df_quarter$vol_sum
  df_quarter
  return (df_quarter)
  
}

year_data = function(df){
  year <- substring(df$Month, 1, 4)
  
  df$year <- year
  
  df_year <- df %>%
    group_by(year)%>%
    summarise(weight_sum = sum(weight_sum))
  
  vol_sumy <- aggregate(df$VOLUMN, by=list(type=df$year),sum)
  vol_sumy
  weight_sumy <- aggregate(df$weight_sum, by=list(type=df$year),sum)
  
  df_year = data.frame(year = vol_sumy$type,vol_sumy = vol_sumy$x,
                       weight_sumy = weight_sumy$x)
  df_year$weight_mean <- df_year$weight_sumy/df_year$vol_sumy
  df_year
  return (df_year)
}

get the year and quarter data

#author :Cai
mass_quarter <- quarter_data(mass)
food_quarter <- quarter_data(food)
drug_quarter <- quarter_data(drug)
all_quarter <- quarter_data(all)

mass_quarter 
##    quarter  vol_sum weight_sum weight_mean
## 1  2004-Q3 239657.7   167529.6   0.6990370
## 2  2004-Q4 370695.0   697060.5   1.8804151
## 3  2005-Q1 213694.1   249080.0   1.1655912
## 4  2005-Q2 333966.1   291528.5   0.8729285
## 5  2005-Q3 435405.3   347165.7   0.7973392
## 6  2005-Q4 393375.8   634279.5   1.6124008
## 7  2006-Q1 299799.4   304407.1   1.0153693
## 8  2006-Q2 430553.8   382246.4   0.8878019
## 9  2006-Q3 593808.7   461821.7   0.7777281
## 10 2006-Q4 691742.0  1100523.8   1.5909455
## 11 2007-Q1 466059.9   427720.0   0.9177361
## 12 2007-Q2 500516.1   513774.3   1.0264889
## 13 2007-Q3 485925.4   540260.2   1.1118173
## 14 2007-Q4 464796.1   905773.1   1.9487535
## 15 2008-Q1 358078.3   491064.1   1.3713874
## 16 2008-Q2 501671.6   586683.1   1.1694566
## 17 2008-Q3 375360.2   510172.5   1.3591544
## 18 2008-Q4 438508.1   829393.7   1.8913989
## 19 2009-Q1 201172.7   349685.3   1.7382343
## 20 2009-Q2 283209.2   450824.2   1.5918418
food_quarter 
##    quarter  vol_sum weight_sum weight_mean
## 1  2004-Q3 45254119   65338783    1.443820
## 2  2004-Q4 53708734  114192126    2.126137
## 3  2005-Q1 40131741   58267757    1.451912
## 4  2005-Q2 40838462   58920608    1.442772
## 5  2005-Q3 43427850   60045958    1.382660
## 6  2005-Q4 53842154  115550223    2.146092
## 7  2006-Q1 42313931   60169812    1.421986
## 8  2006-Q2 41319440   64407806    1.558777
## 9  2006-Q3 40190757   66527202    1.655286
## 10 2006-Q4 48449256  118508465    2.446033
## 11 2007-Q1 34878405   60592685    1.737255
## 12 2007-Q2 37263025   64619715    1.734151
## 13 2007-Q3 37785439   64805939    1.715103
## 14 2007-Q4 46781096  113948259    2.435776
## 15 2008-Q1 33592958   61586845    1.833326
## 16 2008-Q2 34784510   64242449    1.846869
## 17 2008-Q3 34917310   64928777    1.859501
## 18 2008-Q4 43336629  119913144    2.767016
## 19 2009-Q1 34253703   65968498    1.925879
## 20 2009-Q2 37599772   69647089    1.852327
drug_quarter 
##    quarter   vol_sum weight_sum weight_mean
## 1  2004-Q3 1049291.9   939074.4   0.8949601
## 2  2004-Q4 1153810.6  1219256.0   1.0567211
## 3  2005-Q1 1109632.6  1025841.4   0.9244874
## 4  2005-Q2 1045862.2   927818.8   0.8871329
## 5  2005-Q3  950410.0   857156.0   0.9018803
## 6  2005-Q4 1091571.5  1154029.2   1.0572182
## 7  2006-Q1 1031056.5   987075.1   0.9573433
## 8  2006-Q2 1037414.2   943022.7   0.9090127
## 9  2006-Q3  993771.1   955330.1   0.9613181
## 10 2006-Q4  999417.4  1170064.1   1.1707461
## 11 2007-Q1  954168.2   992351.1   1.0400169
## 12 2007-Q2 1006906.0  1037062.4   1.0299496
## 13 2007-Q3  957705.6   998126.3   1.0422058
## 14 2007-Q4 1069543.0  1252420.9   1.1709869
## 15 2008-Q1 1101923.9  1152598.1   1.0459871
## 16 2008-Q2  997238.3  1089123.3   1.0921395
## 17 2008-Q3  932400.2  1033075.2   1.1079740
## 18 2008-Q4  973827.6  1321109.6   1.3566155
## 19 2009-Q1  921840.3  1141290.5   1.2380566
## 20 2009-Q2  890509.2  1057959.7   1.1880390
all_quarter 
##    quarter  vol_sum weight_sum weight_mean
## 1  2004-Q3 45493777   65506313    1.439896
## 2  2004-Q4 54079429  114889187    2.124453
## 3  2005-Q1 40345435   58516837    1.450396
## 4  2005-Q2 41172428   59212136    1.438150
## 5  2005-Q3 43863256   60393124    1.376850
## 6  2005-Q4 54235530  116184502    2.142221
## 7  2006-Q1 42613731   60474219    1.419125
## 8  2006-Q2 41749994   64790052    1.551858
## 9  2006-Q3 40784565   66989024    1.642509
## 10 2006-Q4 49140998  119608989    2.433996
## 11 2007-Q1 35344465   61020405    1.726449
## 12 2007-Q2 37763541   65133489    1.724772
## 13 2007-Q3 38271365   65346199    1.707444
## 14 2007-Q4 47245892  114854032    2.430984
## 15 2008-Q1 33951036   62077909    1.828454
## 16 2008-Q2 35286182   64829133    1.837239
## 17 2008-Q3 35292671   65438950    1.854180
## 18 2008-Q4 43775137  120742537    2.758245
## 19 2009-Q1 34454876   66318184    1.924784
## 20 2009-Q2 37882981   70097913    1.850380
mass_year <- year_data(mass)
food_year <- year_data(food)
drug_year <- year_data(drug)
all_year <- year_data(all)

mass_year 
##   year  vol_sumy weight_sumy weight_mean
## 1 2004  610352.7    864590.1    1.416542
## 2 2005 1376441.3   1522053.7    1.105789
## 3 2006 2015903.9   2248999.1    1.115628
## 4 2007 1917297.6   2387527.6    1.245257
## 5 2008 1673618.2   2417313.4    1.444364
## 6 2009  484381.9    800509.5    1.652641
food_year 
##   year  vol_sumy weight_sumy weight_mean
## 1 2004  98962853   179530909    1.814124
## 2 2005 178240207   292784546    1.642640
## 3 2006 172273384   309613285    1.797221
## 4 2007 156707966   303966598    1.939701
## 5 2008 146631408   310671215    2.118722
## 6 2009  71853475   135615587    1.887391
drug_year 
##   year vol_sumy weight_sumy weight_mean
## 1 2004  2203102     2158330   0.9796777
## 2 2005  4197476     3964845   0.9445784
## 3 2006  4061659     4055492   0.9984816
## 4 2007  3988323     4279961   1.0731230
## 5 2008  4005390     4595906   1.1474304
## 6 2009  1812350     2199250   1.2134801
all_year 
##   year  vol_sumy weight_sumy weight_mean
## 1 2004  99573206   180395499    1.811687
## 2 2005 179616648   294306600    1.638526
## 3 2006 174289288   311862284    1.789337
## 4 2007 158625263   306354125    1.931307
## 5 2008 148305026   313088528    2.111112
## 6 2009  72337857   136416097    1.885819
all = sum(all$weight_sum)
#author :Cai
# Check for missing values
sum(is.na(mass))
## [1] 0
sum(is.na(food))
## [1] 0
sum(is.na(drug))
## [1] 0

Visualisation data

#author :Cai
year_plot <- ggplot(all_year)
x = c(1:nrow(all_year))
year_plot + geom_line(aes(x=x,y=all_year[,3]),color="red") +
  geom_line(aes(x=x,y=mass_year[,3]),color="blue") +
  geom_line(aes(x=x,y=food_year[,3]),color="green") +
  geom_line(aes(x=x,y=drug_year[,3]),color="skyblue") +
  scale_x_continuous(label = function(x){return(all_year[x,1])})

mass
##       Month    VOLUMN    PRICE weight_sum
## 2  2004_Jul  91942.57 0.540062   49654.69
## 3  2004_Aug  80327.29 0.642281   51592.69
## 4  2004_Sep  67387.86 0.983593   66282.22
## 5  2004_Oct  81307.07 1.326053  107817.49
## 6  2004_Nov 172584.29 2.154180  371777.62
## 7  2004_Dec 116803.64 1.861803  217465.37
## 8  2005_Jan  67193.46 1.111264   74669.68
## 9  2005_Feb  65111.32 1.109623   72249.02
## 10 2005_Mar  81389.36 1.255217  102161.30
## 11 2005_Apr  90105.43 0.967633   87188.99
## 12 2005_May 136057.89 0.785525  106876.88
## 13 2005_Jun 107802.75 0.904083   97462.63
## 14 2005_Jul 114619.79 0.855804   98092.07
## 15 2005_Aug 152440.71 0.796726  121453.48
## 16 2005_Sep 168344.79 0.758088  127620.16
## 17 2005_Oct 127759.21 0.991650  126692.42
## 18 2005_Nov 143462.07 2.007793  288042.14
## 19 2005_Dec 122154.54 1.797272  219544.93
## 20 2006_Jan  88740.25 1.008996   89538.56
## 21 2006_Feb  87689.43 1.055606   92565.49
## 22 2006_Mar 123369.75 0.991354  122303.10
## 23 2006_Apr 155312.04 0.856214  132980.34
## 24 2006_May 140064.93 0.854864  119736.47
## 25 2006_Jun 135176.79 0.958224  129529.64
## 26 2006_Jul 178455.57 0.723941  129191.30
## 27 2006_Aug 216618.21 0.771413  167102.11
## 28 2006_Sep 198734.93 0.832910  165528.31
## 29 2006_Oct 215120.07 0.956922  205853.13
## 30 2006_Nov 261636.86 1.912054  500263.80
## 31 2006_Dec 214985.07 1.834578  394406.88
## 32 2007_Jan 149417.79 0.983257  146916.08
## 33 2007_Feb 141932.71 0.898412  127514.05
## 34 2007_Mar 174709.43 0.877399  153289.88
## 35 2007_Apr 169455.00 0.946082  160318.33
## 36 2007_May 168988.82 0.948443  160276.26
## 37 2007_Jun 162072.32 1.191935  193179.67
## 38 2007_Jul 137077.07 1.318379  180719.53
## 39 2007_Aug 167178.39 1.047446  175110.34
## 40 2007_Sep 181669.89 1.015195  184430.37
## 41 2007_Oct 165415.43 1.314417  217424.85
## 42 2007_Nov 173307.93 2.376205  411815.17
## 43 2007_Dec 126072.79 2.193440  276533.09
## 44 2008_Jan 105059.14 1.443777  151681.97
## 45 2008_Feb 149494.57 1.077497  161079.95
## 46 2008_Mar 103524.57 1.722317  178302.13
## 47 2008_Apr 174153.64 1.104731  192392.93
## 48 2008_May 197059.00 1.033892  203737.72
## 49 2008_Jun 130458.93 1.460632  190552.49
## 50 2008_Jul 133018.07 1.309694  174212.97
## 51 2008_Aug 146845.21 1.248834  183385.30
## 52 2008_Sep  95496.96 1.597687  152574.26
## 53 2008_Oct  78102.11 1.838252  143571.35
## 54 2008_Nov 215058.29 1.976768  425120.34
## 55 2008_Dec 145347.68 1.793644  260701.99
## 56 2009_Jan  79226.18 1.655723  131176.61
## 57 2009_Feb  53561.14 1.751621   93818.82
## 58 2009_Mar  68385.39 1.823341  124689.89
## 59 2009_Apr  87508.82 1.696780  148483.22
## 60 2009_May  87781.79 1.627419  142857.75
## 61 2009_Jun 107918.57 1.477811  159483.25

#get month data

#author :Du
mass_month <- data.frame(mass$Month,mass$VOLUMN,mass$PRICE,mass$weight_sum)
mass_month
##    mass.Month mass.VOLUMN mass.PRICE mass.weight_sum
## 1    2004_Jul    91942.57   0.540062        49654.69
## 2    2004_Aug    80327.29   0.642281        51592.69
## 3    2004_Sep    67387.86   0.983593        66282.22
## 4    2004_Oct    81307.07   1.326053       107817.49
## 5    2004_Nov   172584.29   2.154180       371777.62
## 6    2004_Dec   116803.64   1.861803       217465.37
## 7    2005_Jan    67193.46   1.111264        74669.68
## 8    2005_Feb    65111.32   1.109623        72249.02
## 9    2005_Mar    81389.36   1.255217       102161.30
## 10   2005_Apr    90105.43   0.967633        87188.99
## 11   2005_May   136057.89   0.785525       106876.88
## 12   2005_Jun   107802.75   0.904083        97462.63
## 13   2005_Jul   114619.79   0.855804        98092.07
## 14   2005_Aug   152440.71   0.796726       121453.48
## 15   2005_Sep   168344.79   0.758088       127620.16
## 16   2005_Oct   127759.21   0.991650       126692.42
## 17   2005_Nov   143462.07   2.007793       288042.14
## 18   2005_Dec   122154.54   1.797272       219544.93
## 19   2006_Jan    88740.25   1.008996        89538.56
## 20   2006_Feb    87689.43   1.055606        92565.49
## 21   2006_Mar   123369.75   0.991354       122303.10
## 22   2006_Apr   155312.04   0.856214       132980.34
## 23   2006_May   140064.93   0.854864       119736.47
## 24   2006_Jun   135176.79   0.958224       129529.64
## 25   2006_Jul   178455.57   0.723941       129191.30
## 26   2006_Aug   216618.21   0.771413       167102.11
## 27   2006_Sep   198734.93   0.832910       165528.31
## 28   2006_Oct   215120.07   0.956922       205853.13
## 29   2006_Nov   261636.86   1.912054       500263.80
## 30   2006_Dec   214985.07   1.834578       394406.88
## 31   2007_Jan   149417.79   0.983257       146916.08
## 32   2007_Feb   141932.71   0.898412       127514.05
## 33   2007_Mar   174709.43   0.877399       153289.88
## 34   2007_Apr   169455.00   0.946082       160318.33
## 35   2007_May   168988.82   0.948443       160276.26
## 36   2007_Jun   162072.32   1.191935       193179.67
## 37   2007_Jul   137077.07   1.318379       180719.53
## 38   2007_Aug   167178.39   1.047446       175110.34
## 39   2007_Sep   181669.89   1.015195       184430.37
## 40   2007_Oct   165415.43   1.314417       217424.85
## 41   2007_Nov   173307.93   2.376205       411815.17
## 42   2007_Dec   126072.79   2.193440       276533.09
## 43   2008_Jan   105059.14   1.443777       151681.97
## 44   2008_Feb   149494.57   1.077497       161079.95
## 45   2008_Mar   103524.57   1.722317       178302.13
## 46   2008_Apr   174153.64   1.104731       192392.93
## 47   2008_May   197059.00   1.033892       203737.72
## 48   2008_Jun   130458.93   1.460632       190552.49
## 49   2008_Jul   133018.07   1.309694       174212.97
## 50   2008_Aug   146845.21   1.248834       183385.30
## 51   2008_Sep    95496.96   1.597687       152574.26
## 52   2008_Oct    78102.11   1.838252       143571.35
## 53   2008_Nov   215058.29   1.976768       425120.34
## 54   2008_Dec   145347.68   1.793644       260701.99
## 55   2009_Jan    79226.18   1.655723       131176.61
## 56   2009_Feb    53561.14   1.751621        93818.82
## 57   2009_Mar    68385.39   1.823341       124689.89
## 58   2009_Apr    87508.82   1.696780       148483.22
## 59   2009_May    87781.79   1.627419       142857.75
## 60   2009_Jun   107918.57   1.477811       159483.25
food_month <- data.frame(food$Month,food$VOLUMN,food$PRICE,food$weight_sum)
food_month
##    food.Month food.VOLUMN food.PRICE food.weight_sum
## 1    2004_Jul    14580047   1.492053        21754203
## 2    2004_Aug    14809138   1.425275        21107094
## 3    2004_Sep    15864934   1.416803        22477486
## 4    2004_Oct    17125286   1.565595        26811262
## 5    2004_Nov    20660633   2.405971        49708883
## 6    2004_Dec    15922816   2.365912        37671981
## 7    2005_Jan    12963245   1.482576        19218996
## 8    2005_Feb    12795191   1.398810        17898041
## 9    2005_Mar    14373304   1.471528        21150720
## 10   2005_Apr    13649767   1.397770        19079235
## 11   2005_May    13877405   1.423743        19757858
## 12   2005_Jun    13311289   1.508758        20083514
## 13   2005_Jul    13453651   1.475236        19847310
## 14   2005_Aug    14053983   1.383799        19447887
## 15   2005_Sep    15920217   1.303422        20750761
## 16   2005_Oct    16409994   1.581861        25958330
## 17   2005_Nov    21044428   2.453094        51623960
## 18   2005_Dec    16387732   2.316851        37967933
## 19   2006_Jan    13403200   1.484444        19896300
## 20   2006_Feb    13562728   1.382346        18748382
## 21   2006_Mar    15348004   1.402471        21525130
## 22   2006_Apr    14298923   1.517994        21705679
## 23   2006_May    13635635   1.535130        20932473
## 24   2006_Jun    13384882   1.626436        21769653
## 25   2006_Jul    13270426   1.632979        21670327
## 26   2006_Aug    13210034   1.661734        21951563
## 27   2006_Sep    13710296   1.670665        22905312
## 28   2006_Oct    14707820   1.877999        27621271
## 29   2006_Nov    19268179   2.801139        53972847
## 30   2006_Dec    14473257   2.550521        36914347
## 31   2007_Jan    11343783   1.807471        20503559
## 32   2007_Feb    11147826   1.688711        18825456
## 33   2007_Mar    12386796   1.716640        21263670
## 34   2007_Apr    11951513   1.785742        21342318
## 35   2007_May    12634818   1.681302        21242945
## 36   2007_Jun    12676694   1.738186        22034452
## 37   2007_Jul    12036229   1.776372        21380821
## 38   2007_Aug    12535094   1.705026        21372660
## 39   2007_Sep    13214117   1.668856        22052458
## 40   2007_Oct    14616720   1.807921        26425874
## 41   2007_Nov    18908816   2.837957        53662408
## 42   2007_Dec    13255560   2.554398        33859977
## 43   2008_Jan    11116519   1.852541        20593807
## 44   2008_Feb    10703282   1.802626        19294015
## 45   2008_Mar    11773157   1.843093        21699023
## 46   2008_Apr    11475396   1.776068        20381083
## 47   2008_May    11807544   1.835863        21677033
## 48   2008_Jun    11501571   1.928809        22184333
## 49   2008_Jul    11730317   1.828197        21445330
## 50   2008_Aug    11499209   1.869837        21501646
## 51   2008_Sep    11687785   1.880750        21981801
## 52   2008_Oct    12939811   2.137677        27661136
## 53   2008_Nov    17506533   3.144949        55057153
## 54   2008_Dec    12890286   2.885495        37194855
## 55   2009_Jan    10582184   2.120130        22435606
## 56   2009_Feb    10634505   1.839275        19559780
## 57   2009_Mar    13037014   1.838850        23973112
## 58   2009_Apr    12569873   1.840990        23141011
## 59   2009_May    12705934   1.802271        22899537
## 60   2009_Jun    12323965   1.915499        23606542
drug_month <- data.frame(drug$Month,drug$VOLUMN,drug$PRICE,drug$weight_sum)
drug_month
##    drug.Month drug.VOLUMN drug.PRICE drug.weight_sum
## 1    2004_Jul    342038.9   0.915870        313263.2
## 2    2004_Aug    353868.0   0.878705        310945.5
## 3    2004_Sep    353385.0   0.890999        314865.7
## 4    2004_Oct    382660.5   0.918136        351334.4
## 5    2004_Nov    396199.8   1.161077        460018.5
## 6    2004_Dec    374950.2   1.087886        407903.1
## 7    2005_Jan    373119.7   0.907105        338458.8
## 8    2005_Feb    363302.8   0.950365        345270.2
## 9    2005_Mar    373210.1   0.916675        342112.4
## 10   2005_Apr    356498.6   0.898380        320271.2
## 11   2005_May    373107.9   0.880589        328554.7
## 12   2005_Jun    316255.7   0.882175        278992.9
## 13   2005_Jul    309310.4   0.904569        279792.6
## 14   2005_Aug    318453.2   0.913966        291055.4
## 15   2005_Sep    322646.4   0.887374        286308.0
## 16   2005_Oct    362755.7   0.968978        351502.3
## 17   2005_Nov    376950.7   1.144221        431314.9
## 18   2005_Dec    351865.1   1.054984        371212.0
## 19   2006_Jan    338764.8   0.984137        333391.0
## 20   2006_Feb    325562.7   0.961842        313139.9
## 21   2006_Mar    366729.0   0.928599        340544.2
## 22   2006_Apr    353113.2   0.900744        318064.6
## 23   2006_May    355834.2   0.914624        325454.5
## 24   2006_Jun    328466.9   0.911823        299503.6
## 25   2006_Jul    317527.0   0.955292        303331.0
## 26   2006_Aug    330962.4   0.966118        319748.8
## 27   2006_Sep    345281.6   0.962259        332250.4
## 28   2006_Oct    344282.4   1.080196        371892.5
## 29   2006_Nov    339512.9   1.278718        434141.2
## 30   2006_Dec    315622.1   1.153374        364030.4
## 31   2007_Jan    303314.6   1.054472        319836.8
## 32   2007_Feb    307557.4   1.037891        319211.0
## 33   2007_Mar    343296.2   1.029150        353303.3
## 34   2007_Apr    337749.6   1.018466        343986.5
## 35   2007_May    349303.3   1.032396        360619.3
## 36   2007_Jun    319853.0   1.039404        332456.5
## 37   2007_Jul    308596.2   1.046101        322822.8
## 38   2007_Aug    312512.3   1.045203        326638.8
## 39   2007_Sep    336597.0   1.035852        348664.7
## 40   2007_Oct    364338.8   1.090308        397241.5
## 41   2007_Nov    362315.0   1.283790        465136.4
## 42   2007_Dec    342889.2   1.137519        390043.0
## 43   2008_Jan    339879.9   1.109045        376942.1
## 44   2008_Feb    355203.6   1.048227        372334.0
## 45   2008_Mar    406840.4   0.991352        403322.0
## 46   2008_Apr    332608.3   1.087043        361559.5
## 47   2008_May    350392.2   1.101599        385991.7
## 48   2008_Jun    314237.8   1.086986        341572.1
## 49   2008_Jul    310672.0   1.069949        332403.2
## 50   2008_Aug    301160.8   1.122488        338049.3
## 51   2008_Sep    320567.5   1.131190        362622.8
## 52   2008_Oct    324321.3   1.210744        392670.1
## 53   2008_Nov    328502.6   1.538714        505471.6
## 54   2008_Dec    321003.6   1.317642        422967.9
## 55   2009_Jan    307358.9   1.252852        385075.2
## 56   2009_Feb    288944.6   1.235202        356904.9
## 57   2009_Mar    325536.9   1.226621        399310.4
## 58   2009_Apr    316638.4   1.161370        367734.3
## 59   2009_May    299196.1   1.196255        357914.9
## 60   2009_Jun    274674.6   1.209833        332310.4
STL model
#author :Du
#mass
#Generate time series objects
ts_mass_month <- ts(mass_month$mass.weight_sum,start = c(2004,6),frequency = 12)
fit_mass <- stl(ts_mass_month,s.window = 'period')
plot(fit_mass)

fit_mass %>% forecast(method="naive") %>% autoplot() + ylab("sales")+
  theme(text = element_text(family = "STHeiti"))+
  theme(plot.title = element_text(hjust = 0.5))

#food
#Generate time series objects
ts_food_month <- ts(food_month$food.weight_sum,start = c(2004,6),frequency = 12)
fit_food <- stl(ts_food_month,s.window = 'period')
plot(fit_food)

fit_food %>% forecast(method="naive") %>% autoplot() + ylab("sales")+
  theme(text = element_text(family = "STHeiti"))+
  theme(plot.title = element_text(hjust = 0.5))

#drug
#Generate time series objects
ts_drug_month <- ts(drug_month$drug.weight_sum,start = c(2004,6),frequency = 12)
fit_drug <- stl(ts_drug_month,s.window = 'period')
plot(fit_drug)

fit_drug %>% forecast(method="naive") %>% autoplot() + ylab("sales")+
  theme(text = element_text(family = "STHeiti"))+
  theme(plot.title = element_text(hjust = 0.5))

ETS model
#author :Du
#mass
fit_mass %>% forecast(h=36) %>%
  autoplot() +
  xlab("time") +
  ylab("sales")+
  ggtitle('mass cake predict') +
  theme(text = element_text(family = "STHeiti"))+
  theme(plot.title = element_text(hjust = 0.5))

#food
fit_food %>% forecast(h=36) %>%
  autoplot() +
  xlab("time") +
  ylab("sales")+
  ggtitle('food cake predict') +
  theme(text = element_text(family = "STHeiti"))+
  theme(plot.title = element_text(hjust = 0.5))

#drug
fit_drug %>% forecast(h=36) %>%
  autoplot() +
  xlab("time") +
  ylab("sales")+
  ggtitle('drug cake predict') +
  theme(text = element_text(family = "STHeiti"))+
  theme(plot.title = element_text(hjust = 0.5))

#author :Liao
data_mass_quarter=ts(mass_quarter$weight_sum,frequency=4,start=2004,end=2009)
data=data_mass_quarter
plot(data)

ndiffs(data)
## [1] 0
ddata <- diff(data)
plot(ddata)

ADF<-adf.test(ddata)
## Warning in adf.test(ddata): p-value smaller than printed p-value
ADF
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ddata
## Dickey-Fuller = -9.5978, Lag order = 2, p-value = 0.01
## alternative hypothesis: stationary

#####2.Model Sizing and Fitting

# author: Liao
fit <- auto.arima(data)
fit
## Series: data 
## ARIMA(0,1,0)(1,1,0)[4] 
## 
## Coefficients:
##          sar1
##       -0.5622
## s.e.   0.1837
## 
## sigma^2 = 2.137e+10:  log likelihood = -213.23
## AIC=430.46   AICc=431.38   BIC=432
accuracy(fit)
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -29944.76 123545.7 83392.11 -9.999367 20.49734 0.6007077
##                    ACF1
## Training set -0.2865116

#####3.Model diagnosis

# author: Liao
qqnorm(fit$residuals)  #plot   
qqline(fit$residuals)  #add line

Box.test(fit$residuals, type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  fit$residuals
## X-squared = 1.9824, df = 1, p-value = 0.1591
#Residual test, significant: residuals are not smooth p-value greater than 0.05 Not suitable

HW-model

#author:Wang

mass_month <- data.frame(mass$Month,mass$VOLUMN,mass$PRICE,mass$weight_sum)
mass_month
##    mass.Month mass.VOLUMN mass.PRICE mass.weight_sum
## 1    2004_Jul    91942.57   0.540062        49654.69
## 2    2004_Aug    80327.29   0.642281        51592.69
## 3    2004_Sep    67387.86   0.983593        66282.22
## 4    2004_Oct    81307.07   1.326053       107817.49
## 5    2004_Nov   172584.29   2.154180       371777.62
## 6    2004_Dec   116803.64   1.861803       217465.37
## 7    2005_Jan    67193.46   1.111264        74669.68
## 8    2005_Feb    65111.32   1.109623        72249.02
## 9    2005_Mar    81389.36   1.255217       102161.30
## 10   2005_Apr    90105.43   0.967633        87188.99
## 11   2005_May   136057.89   0.785525       106876.88
## 12   2005_Jun   107802.75   0.904083        97462.63
## 13   2005_Jul   114619.79   0.855804        98092.07
## 14   2005_Aug   152440.71   0.796726       121453.48
## 15   2005_Sep   168344.79   0.758088       127620.16
## 16   2005_Oct   127759.21   0.991650       126692.42
## 17   2005_Nov   143462.07   2.007793       288042.14
## 18   2005_Dec   122154.54   1.797272       219544.93
## 19   2006_Jan    88740.25   1.008996        89538.56
## 20   2006_Feb    87689.43   1.055606        92565.49
## 21   2006_Mar   123369.75   0.991354       122303.10
## 22   2006_Apr   155312.04   0.856214       132980.34
## 23   2006_May   140064.93   0.854864       119736.47
## 24   2006_Jun   135176.79   0.958224       129529.64
## 25   2006_Jul   178455.57   0.723941       129191.30
## 26   2006_Aug   216618.21   0.771413       167102.11
## 27   2006_Sep   198734.93   0.832910       165528.31
## 28   2006_Oct   215120.07   0.956922       205853.13
## 29   2006_Nov   261636.86   1.912054       500263.80
## 30   2006_Dec   214985.07   1.834578       394406.88
## 31   2007_Jan   149417.79   0.983257       146916.08
## 32   2007_Feb   141932.71   0.898412       127514.05
## 33   2007_Mar   174709.43   0.877399       153289.88
## 34   2007_Apr   169455.00   0.946082       160318.33
## 35   2007_May   168988.82   0.948443       160276.26
## 36   2007_Jun   162072.32   1.191935       193179.67
## 37   2007_Jul   137077.07   1.318379       180719.53
## 38   2007_Aug   167178.39   1.047446       175110.34
## 39   2007_Sep   181669.89   1.015195       184430.37
## 40   2007_Oct   165415.43   1.314417       217424.85
## 41   2007_Nov   173307.93   2.376205       411815.17
## 42   2007_Dec   126072.79   2.193440       276533.09
## 43   2008_Jan   105059.14   1.443777       151681.97
## 44   2008_Feb   149494.57   1.077497       161079.95
## 45   2008_Mar   103524.57   1.722317       178302.13
## 46   2008_Apr   174153.64   1.104731       192392.93
## 47   2008_May   197059.00   1.033892       203737.72
## 48   2008_Jun   130458.93   1.460632       190552.49
## 49   2008_Jul   133018.07   1.309694       174212.97
## 50   2008_Aug   146845.21   1.248834       183385.30
## 51   2008_Sep    95496.96   1.597687       152574.26
## 52   2008_Oct    78102.11   1.838252       143571.35
## 53   2008_Nov   215058.29   1.976768       425120.34
## 54   2008_Dec   145347.68   1.793644       260701.99
## 55   2009_Jan    79226.18   1.655723       131176.61
## 56   2009_Feb    53561.14   1.751621        93818.82
## 57   2009_Mar    68385.39   1.823341       124689.89
## 58   2009_Apr    87508.82   1.696780       148483.22
## 59   2009_May    87781.79   1.627419       142857.75
## 60   2009_Jun   107918.57   1.477811       159483.25
food_month <- data.frame(food$Month,food$VOLUMN,food$PRICE,food$weight_sum)
food_month
##    food.Month food.VOLUMN food.PRICE food.weight_sum
## 1    2004_Jul    14580047   1.492053        21754203
## 2    2004_Aug    14809138   1.425275        21107094
## 3    2004_Sep    15864934   1.416803        22477486
## 4    2004_Oct    17125286   1.565595        26811262
## 5    2004_Nov    20660633   2.405971        49708883
## 6    2004_Dec    15922816   2.365912        37671981
## 7    2005_Jan    12963245   1.482576        19218996
## 8    2005_Feb    12795191   1.398810        17898041
## 9    2005_Mar    14373304   1.471528        21150720
## 10   2005_Apr    13649767   1.397770        19079235
## 11   2005_May    13877405   1.423743        19757858
## 12   2005_Jun    13311289   1.508758        20083514
## 13   2005_Jul    13453651   1.475236        19847310
## 14   2005_Aug    14053983   1.383799        19447887
## 15   2005_Sep    15920217   1.303422        20750761
## 16   2005_Oct    16409994   1.581861        25958330
## 17   2005_Nov    21044428   2.453094        51623960
## 18   2005_Dec    16387732   2.316851        37967933
## 19   2006_Jan    13403200   1.484444        19896300
## 20   2006_Feb    13562728   1.382346        18748382
## 21   2006_Mar    15348004   1.402471        21525130
## 22   2006_Apr    14298923   1.517994        21705679
## 23   2006_May    13635635   1.535130        20932473
## 24   2006_Jun    13384882   1.626436        21769653
## 25   2006_Jul    13270426   1.632979        21670327
## 26   2006_Aug    13210034   1.661734        21951563
## 27   2006_Sep    13710296   1.670665        22905312
## 28   2006_Oct    14707820   1.877999        27621271
## 29   2006_Nov    19268179   2.801139        53972847
## 30   2006_Dec    14473257   2.550521        36914347
## 31   2007_Jan    11343783   1.807471        20503559
## 32   2007_Feb    11147826   1.688711        18825456
## 33   2007_Mar    12386796   1.716640        21263670
## 34   2007_Apr    11951513   1.785742        21342318
## 35   2007_May    12634818   1.681302        21242945
## 36   2007_Jun    12676694   1.738186        22034452
## 37   2007_Jul    12036229   1.776372        21380821
## 38   2007_Aug    12535094   1.705026        21372660
## 39   2007_Sep    13214117   1.668856        22052458
## 40   2007_Oct    14616720   1.807921        26425874
## 41   2007_Nov    18908816   2.837957        53662408
## 42   2007_Dec    13255560   2.554398        33859977
## 43   2008_Jan    11116519   1.852541        20593807
## 44   2008_Feb    10703282   1.802626        19294015
## 45   2008_Mar    11773157   1.843093        21699023
## 46   2008_Apr    11475396   1.776068        20381083
## 47   2008_May    11807544   1.835863        21677033
## 48   2008_Jun    11501571   1.928809        22184333
## 49   2008_Jul    11730317   1.828197        21445330
## 50   2008_Aug    11499209   1.869837        21501646
## 51   2008_Sep    11687785   1.880750        21981801
## 52   2008_Oct    12939811   2.137677        27661136
## 53   2008_Nov    17506533   3.144949        55057153
## 54   2008_Dec    12890286   2.885495        37194855
## 55   2009_Jan    10582184   2.120130        22435606
## 56   2009_Feb    10634505   1.839275        19559780
## 57   2009_Mar    13037014   1.838850        23973112
## 58   2009_Apr    12569873   1.840990        23141011
## 59   2009_May    12705934   1.802271        22899537
## 60   2009_Jun    12323965   1.915499        23606542
drug_month <- data.frame(drug$Month,drug$VOLUMN,drug$PRICE,drug$weight_sum)
drug_month
##    drug.Month drug.VOLUMN drug.PRICE drug.weight_sum
## 1    2004_Jul    342038.9   0.915870        313263.2
## 2    2004_Aug    353868.0   0.878705        310945.5
## 3    2004_Sep    353385.0   0.890999        314865.7
## 4    2004_Oct    382660.5   0.918136        351334.4
## 5    2004_Nov    396199.8   1.161077        460018.5
## 6    2004_Dec    374950.2   1.087886        407903.1
## 7    2005_Jan    373119.7   0.907105        338458.8
## 8    2005_Feb    363302.8   0.950365        345270.2
## 9    2005_Mar    373210.1   0.916675        342112.4
## 10   2005_Apr    356498.6   0.898380        320271.2
## 11   2005_May    373107.9   0.880589        328554.7
## 12   2005_Jun    316255.7   0.882175        278992.9
## 13   2005_Jul    309310.4   0.904569        279792.6
## 14   2005_Aug    318453.2   0.913966        291055.4
## 15   2005_Sep    322646.4   0.887374        286308.0
## 16   2005_Oct    362755.7   0.968978        351502.3
## 17   2005_Nov    376950.7   1.144221        431314.9
## 18   2005_Dec    351865.1   1.054984        371212.0
## 19   2006_Jan    338764.8   0.984137        333391.0
## 20   2006_Feb    325562.7   0.961842        313139.9
## 21   2006_Mar    366729.0   0.928599        340544.2
## 22   2006_Apr    353113.2   0.900744        318064.6
## 23   2006_May    355834.2   0.914624        325454.5
## 24   2006_Jun    328466.9   0.911823        299503.6
## 25   2006_Jul    317527.0   0.955292        303331.0
## 26   2006_Aug    330962.4   0.966118        319748.8
## 27   2006_Sep    345281.6   0.962259        332250.4
## 28   2006_Oct    344282.4   1.080196        371892.5
## 29   2006_Nov    339512.9   1.278718        434141.2
## 30   2006_Dec    315622.1   1.153374        364030.4
## 31   2007_Jan    303314.6   1.054472        319836.8
## 32   2007_Feb    307557.4   1.037891        319211.0
## 33   2007_Mar    343296.2   1.029150        353303.3
## 34   2007_Apr    337749.6   1.018466        343986.5
## 35   2007_May    349303.3   1.032396        360619.3
## 36   2007_Jun    319853.0   1.039404        332456.5
## 37   2007_Jul    308596.2   1.046101        322822.8
## 38   2007_Aug    312512.3   1.045203        326638.8
## 39   2007_Sep    336597.0   1.035852        348664.7
## 40   2007_Oct    364338.8   1.090308        397241.5
## 41   2007_Nov    362315.0   1.283790        465136.4
## 42   2007_Dec    342889.2   1.137519        390043.0
## 43   2008_Jan    339879.9   1.109045        376942.1
## 44   2008_Feb    355203.6   1.048227        372334.0
## 45   2008_Mar    406840.4   0.991352        403322.0
## 46   2008_Apr    332608.3   1.087043        361559.5
## 47   2008_May    350392.2   1.101599        385991.7
## 48   2008_Jun    314237.8   1.086986        341572.1
## 49   2008_Jul    310672.0   1.069949        332403.2
## 50   2008_Aug    301160.8   1.122488        338049.3
## 51   2008_Sep    320567.5   1.131190        362622.8
## 52   2008_Oct    324321.3   1.210744        392670.1
## 53   2008_Nov    328502.6   1.538714        505471.6
## 54   2008_Dec    321003.6   1.317642        422967.9
## 55   2009_Jan    307358.9   1.252852        385075.2
## 56   2009_Feb    288944.6   1.235202        356904.9
## 57   2009_Mar    325536.9   1.226621        399310.4
## 58   2009_Apr    316638.4   1.161370        367734.3
## 59   2009_May    299196.1   1.196255        357914.9
## 60   2009_Jun    274674.6   1.209833        332310.4
# Create monthly data time series
ts_mass_month <- ts(mass_month$mass.weight_sum,start = c(2004,6),frequency = 12)

# Draw a monthly data graph
autoplot(ts_mass_month)

# Forecasting monthly data using the Holt-Winters model
fc <- hw(subset(ts_mass_month,end=length(ts_mass_month)-35),
         damped = TRUE, seasonal="multiplicative", h=35)
autoplot(ts_mass_month) +
  autolayer(fc, series="HW multi damped", PI=FALSE)+
  guides(colour=guide_legend(title="month forecasts"))

# Comparison of Holt-Winters Addition and Multiplication Methods for Monthly Data
aust <- window(ts_mass_month)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
  autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
  autolayer(fit2, series="HW multiplicative forecasts",
            PI=FALSE) +
  xlab("Year") +
  ylab("mass_month)") +
  ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
  guides(colour=guide_legend(title="Forecast"))

# Create quarterly data time series
ts_mass_quarter  <- ts(mass_quarter$weight_sum,frequency=4,start=2004,end=2009)

# Draw quarterly data graphs
autoplot(ts_mass_quarter)

# Quarterly data forecast with Holt-Winters model
fc <- hw(subset(ts_mass_quarter,end=length(ts_mass_quarter)-10),
         damped = TRUE, seasonal="multiplicative", h=35)
## Warning in ets(x, "MAM", alpha = alpha, beta = beta, gamma = gamma, phi = phi, :
## Not enough data to use damping
autoplot(ts_mass_quarter) +
  autolayer(fc, series="HW multi damped", PI=FALSE)+
  guides(colour=guide_legend(title="Daily forecasts"))

# Comparison of Holt-Winters Additive and Multiplicative Methods for Quarterly Data
aust <- window(ts_mass_quarter)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
  autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
  autolayer(fit2, series="HW multiplicative forecasts",
            PI=FALSE) +
  xlab("Year") +
  ylab("mass_quarter)") +
  ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
  guides(colour=guide_legend(title="Forecast"))

####HW-MODEL

# author:wang
### Food data

# Create monthly data time series
ts_food_month <- ts(food_month$food.weight_sum,start = c(2004,6),frequency = 12)

# Draw a monthly data graph
autoplot(ts_food_month)

# Forecasting monthly data using the Holt-Winters model
fc <- hw(subset(ts_food_month,end=length(ts_food_month)-35),
         damped = TRUE, seasonal="multiplicative", h=35)
autoplot(ts_food_month) +
  autolayer(fc, series="HW multi damped", PI=FALSE)+
  guides(colour=guide_legend(title="month forecasts"))

# Comparison of Holt-Winters Addition and Multiplication Methods for Monthly Data
aust <- window(ts_food_month)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
  autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
  autolayer(fit2, series="HW multiplicative forecasts",
            PI=FALSE) +
  xlab("Year") +
  ylab("food_month)") +
  ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
  guides(colour=guide_legend(title="Forecast"))

# Create quarterly data time series
ts_food_quarter  <- ts(food_quarter$weight_sum,frequency=4,start=2004,end=2009)

# Draw quarterly data graphs
autoplot(ts_food_quarter)

# Quarterly data forecast with Holt-Winters model
fc <- hw(subset(ts_food_quarter,end=length(ts_food_quarter)-10),
         damped = TRUE, seasonal="multiplicative", h=35)
## Warning in ets(x, "MAM", alpha = alpha, beta = beta, gamma = gamma, phi = phi, :
## Not enough data to use damping
autoplot(ts_food_quarter) +
  autolayer(fc, series="HW multi damped", PI=FALSE)+
  guides(colour=guide_legend(title="Daily forecasts"))

# Comparison of Holt-Winters Additive and Multiplicative Methods for Quarterly Data
aust <- window(ts_food_quarter)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
  autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
  autolayer(fit2, series="HW multiplicative forecasts",
            PI=FALSE) +
  xlab("Year") +
  ylab("food_quarter)") +
  ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
  guides(colour=guide_legend(title="Forecast"))

####HW-MODEL

# author:wang
### Drug data

# Create monthly data time series
ts_drug_month <- ts(drug_month$drug.weight_sum,start = c(2004,6),frequency = 12)

# Draw a monthly data graph
autoplot(ts_drug_month)

# Forecasting monthly data using the Holt-Winters model
fc <- hw(subset(ts_drug_month,end=length(ts_drug_month)-35),
         damped = TRUE, seasonal="multiplicative", h=35)
autoplot(ts_drug_month) +
  autolayer(fc, series="HW multi damped", PI=FALSE)+
  guides(colour=guide_legend(title="month forecasts"))

# Comparison of Holt-Winters Addition and Multiplication Methods for Monthly Data
aust <- window(ts_drug_month)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
  autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
  autolayer(fit2, series="HW multiplicative forecasts",
            PI=FALSE) +
  xlab("Year") +
  ylab("drug_month)") +
  ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
  guides(colour=guide_legend(title="Forecast"))

# Create quarterly data time series
ts_drug_quarter  <- ts(drug_quarter$weight_sum,frequency=4,start=2004,end=2009)

# Draw quarterly data graphs
autoplot(ts_drug_quarter)

# Quarterly data forecast with Holt-Winters model
fc <- hw(subset(ts_drug_quarter,end=length(ts_drug_quarter)-10),
         damped = TRUE, seasonal="multiplicative", h=35)
## Warning in ets(x, "MAM", alpha = alpha, beta = beta, gamma = gamma, phi = phi, :
## Not enough data to use damping
autoplot(ts_drug_quarter) +
  autolayer(fc, series="HW multi damped", PI=FALSE)+
  guides(colour=guide_legend(title="Daily forecasts"))

# Comparison of Holt-Winters Additive and Multiplicative Methods for Quarterly Data
aust <- window(ts_drug_quarter)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
  autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
  autolayer(fit2, series="HW multiplicative forecasts",
            PI=FALSE) +
  xlab("Year") +
  ylab("drug_quarter)") +
  ggtitle("Comparison of Holt-Winters' Addition and Multiplication Methods") +
  guides(colour=guide_legend(title="Forecast"))

time regression

# author: Ding
# Fill in the missing months
mass_2004 <- filter(mass, substring(mass$Month, 1, 4) == 2004)
mass_2009 <- filter(mass, substring(mass$Month, 1, 4) == 2009)
food_2004 <- filter(food, substring(food$Month, 1, 4) == 2004)
food_2009 <- filter(food, substring(food$Month, 1, 4) == 2009)
drug_2004 <- filter(drug, substring(drug$Month, 1, 4) == 2004)
drug_2009 <- filter(drug, substring(drug$Month, 1, 4) == 2009)

exp04m <- sum(mass_2004$weight_sum)/6
exp04m
## [1] 144098.3
exp09m <- sum(mass_2009$weight_sum)/6
exp09m
## [1] 133418.3
exp04f <- sum(food_2004$weight_sum)/6
exp04f
## [1] 29921818
exp09f <- sum(food_2009$weight_sum)/6
exp09f
## [1] 22602598
exp04d <- sum(drug_2004$weight_sum)/6
exp04d
## [1] 359721.7
exp09d <- sum(drug_2009$weight_sum)/6
exp09d
## [1] 366541.7
mass_year[1,3]<-mass_year[1,3] + exp04m*6
mass_year[6,3]<-mass_year[6,3] + exp09m*6


train_mass <- mass_year
  

library(tsibble)
data_df_year_ts<-train_mass%>%
  mutate(data = as.integer(year)) %>%
  as_tsibble(index =data )

fit_trends <- data_df_year_ts %>%
  model(
    linear = TSLM(weight_sumy  ~ trend()),
  )
fc_trends <- fit_trends %>% forecast(h = 3)

data_df_year_ts %>%
  autoplot(weight_sumy ) +
  geom_line(data = fitted(fit_trends),
            aes(y = .fitted, x= data, colour = .model)) +
  autolayer(fc_trends, alpha = 0.5, level = 95) +
  labs(y = "weight_sum",
       title = "change mass_year   of 3 year")

# food_year


food_year[1,3]<-food_year[1,3] + exp04f*6
food_year[6,3]<-food_year[6,3] + exp09f*6


train_mass2 <- food_year


library(tsibble)
data_df_year_ts2<-train_mass2%>%
  mutate(data = as.integer(year)) %>%
  as_tsibble(index =data )

fit_trends2 <- data_df_year_ts2 %>%
  model(
    linear = TSLM(weight_sumy   ~ trend()),
  )
fc_trends2 <- fit_trends2 %>% forecast(h = 3)
data_df_year_ts2 %>%
  autoplot(weight_sumy  ) +
  geom_line(data = fitted(fit_trends2),
            aes(y = .fitted,x= data, colour = .model)) +
  autolayer(fc_trends2, alpha = 0.5, level = 95) +
  labs(y = "weight_sum",
       title = "change food_year of 3 year")

#drug_year 

drug_year[1,3]<-drug_year[1,3] + exp04d*6
drug_year[6,3]<-drug_year[6,3] + exp09d*6

train_mass3 <- drug_year


library(tsibble)
data_df_year_ts3<-train_mass3%>%
  mutate(data = as.integer(year)) %>%
  as_tsibble(index =data )

fit_trends3 <- data_df_year_ts3 %>%
  model(
    linear = TSLM(weight_sumy   ~ trend()),
  )
fc_trends3 <- fit_trends3 %>% forecast(h = 3)
data_df_year_ts3 %>%
  autoplot(weight_sumy  ) +
  geom_line(data = fitted(fit_trends3),
            aes(y = .fitted,x= data, colour = .model)) +
  autolayer(fc_trends3, alpha = 0.5, level = 95) +
  labs(y = "weight_sum",
       title = "change drug_year of 3 year")

other model set

Traditional models combined with machine learning.
modelset:auto_arima_xgboost,randomForest,earth,prophet_xgboost,stlm_ets,stlm_arima,prophet.
Sys.setlocale('LC_TIME', 'C')
## [1] "C"
month <- mass$Month
month <- str_c('1_',month)
month <- as.Date(month,format='%d_%Y_%b')

mass
##       Month    VOLUMN    PRICE weight_sum
## 2  2004_Jul  91942.57 0.540062   49654.69
## 3  2004_Aug  80327.29 0.642281   51592.69
## 4  2004_Sep  67387.86 0.983593   66282.22
## 5  2004_Oct  81307.07 1.326053  107817.49
## 6  2004_Nov 172584.29 2.154180  371777.62
## 7  2004_Dec 116803.64 1.861803  217465.37
## 8  2005_Jan  67193.46 1.111264   74669.68
## 9  2005_Feb  65111.32 1.109623   72249.02
## 10 2005_Mar  81389.36 1.255217  102161.30
## 11 2005_Apr  90105.43 0.967633   87188.99
## 12 2005_May 136057.89 0.785525  106876.88
## 13 2005_Jun 107802.75 0.904083   97462.63
## 14 2005_Jul 114619.79 0.855804   98092.07
## 15 2005_Aug 152440.71 0.796726  121453.48
## 16 2005_Sep 168344.79 0.758088  127620.16
## 17 2005_Oct 127759.21 0.991650  126692.42
## 18 2005_Nov 143462.07 2.007793  288042.14
## 19 2005_Dec 122154.54 1.797272  219544.93
## 20 2006_Jan  88740.25 1.008996   89538.56
## 21 2006_Feb  87689.43 1.055606   92565.49
## 22 2006_Mar 123369.75 0.991354  122303.10
## 23 2006_Apr 155312.04 0.856214  132980.34
## 24 2006_May 140064.93 0.854864  119736.47
## 25 2006_Jun 135176.79 0.958224  129529.64
## 26 2006_Jul 178455.57 0.723941  129191.30
## 27 2006_Aug 216618.21 0.771413  167102.11
## 28 2006_Sep 198734.93 0.832910  165528.31
## 29 2006_Oct 215120.07 0.956922  205853.13
## 30 2006_Nov 261636.86 1.912054  500263.80
## 31 2006_Dec 214985.07 1.834578  394406.88
## 32 2007_Jan 149417.79 0.983257  146916.08
## 33 2007_Feb 141932.71 0.898412  127514.05
## 34 2007_Mar 174709.43 0.877399  153289.88
## 35 2007_Apr 169455.00 0.946082  160318.33
## 36 2007_May 168988.82 0.948443  160276.26
## 37 2007_Jun 162072.32 1.191935  193179.67
## 38 2007_Jul 137077.07 1.318379  180719.53
## 39 2007_Aug 167178.39 1.047446  175110.34
## 40 2007_Sep 181669.89 1.015195  184430.37
## 41 2007_Oct 165415.43 1.314417  217424.85
## 42 2007_Nov 173307.93 2.376205  411815.17
## 43 2007_Dec 126072.79 2.193440  276533.09
## 44 2008_Jan 105059.14 1.443777  151681.97
## 45 2008_Feb 149494.57 1.077497  161079.95
## 46 2008_Mar 103524.57 1.722317  178302.13
## 47 2008_Apr 174153.64 1.104731  192392.93
## 48 2008_May 197059.00 1.033892  203737.72
## 49 2008_Jun 130458.93 1.460632  190552.49
## 50 2008_Jul 133018.07 1.309694  174212.97
## 51 2008_Aug 146845.21 1.248834  183385.30
## 52 2008_Sep  95496.96 1.597687  152574.26
## 53 2008_Oct  78102.11 1.838252  143571.35
## 54 2008_Nov 215058.29 1.976768  425120.34
## 55 2008_Dec 145347.68 1.793644  260701.99
## 56 2009_Jan  79226.18 1.655723  131176.61
## 57 2009_Feb  53561.14 1.751621   93818.82
## 58 2009_Mar  68385.39 1.823341  124689.89
## 59 2009_Apr  87508.82 1.696780  148483.22
## 60 2009_May  87781.79 1.627419  142857.75
## 61 2009_Jun 107918.57 1.477811  159483.25
mass$ds <- month
mass$y  <-mass$weight_sum



p_mass <-data.frame(mass['ds'],mass['y'])
colnames(mass) <- c('ds','y','PRICE','weight_sum','date','sum')
# author Cai.

# mass forecast 
r_mass <-data.frame(p_mass['ds'],p_mass['y'])
# Data visualisation
r_mass %>%
  plot_time_series(ds,y)
# Split Data 80/20
splits <- initial_time_split(r_mass, prop = 0.9)

recipe_spec <- recipe(y ~ ds, training(splits)) %>%
  step_timeseries_signature(ds) %>%
  # step_fourier(date, period = 365, K = 5) %>%
  step_dummy(all_nominal())

recipe_spec %>% prep() %>% juice()
## # A tibble: 54 × 44
##    ds               y ds_index.num ds_year ds_year.iso ds_half ds_quarter
##    <date>       <dbl>        <dbl>   <int>       <int>   <int>      <int>
##  1 2004-07-01  49655.   1088640000    2004        2004       2          3
##  2 2004-08-01  51593.   1091318400    2004        2004       2          3
##  3 2004-09-01  66282.   1093996800    2004        2004       2          3
##  4 2004-10-01 107817.   1096588800    2004        2004       2          4
##  5 2004-11-01 371778.   1099267200    2004        2004       2          4
##  6 2004-12-01 217465.   1101859200    2004        2004       2          4
##  7 2005-01-01  74670.   1104537600    2005        2004       1          1
##  8 2005-02-01  72249.   1107216000    2005        2005       1          1
##  9 2005-03-01 102161.   1109635200    2005        2005       1          1
## 10 2005-04-01  87189.   1112313600    2005        2005       1          2
## # … with 44 more rows, and 37 more variables: ds_month <int>,
## #   ds_month.xts <int>, ds_day <int>, ds_hour <int>, ds_minute <int>,
## #   ds_second <int>, ds_hour12 <int>, ds_am.pm <int>, ds_wday <int>,
## #   ds_wday.xts <int>, ds_mday <int>, ds_qday <int>, ds_yday <int>,
## #   ds_mweek <int>, ds_week <int>, ds_week.iso <int>, ds_week2 <int>,
## #   ds_week3 <int>, ds_week4 <int>, ds_mday7 <int>, ds_month.lbl_01 <dbl>,
## #   ds_month.lbl_02 <dbl>, ds_month.lbl_03 <dbl>, ds_month.lbl_04 <dbl>, …
# arima_boost

model_fit_arima_boosted <- arima_boost(
  min_n = 2,
  learn_rate = 0.0000015
) %>%
  set_engine(engine = "auto_arima_xgboost") %>%
  fit(y ~ ds + as.numeric(ds) + factor(month(ds, label = TRUE), ordered = F),
      data = training(splits))
## frequency = 12 observations per 1 year
# random forest
model_spec_rf <- rand_forest(trees = 1000, min_n = 50) %>%
  set_engine("randomForest")

workflow_fit_rf <- workflow() %>%
  add_model(model_spec_rf) %>%
  add_recipe(recipe_spec %>% step_rm(ds)) %>%
  fit(training(splits))
# mars
model_spec_mars <- mars(mode = "regression") %>%
  set_engine("earth") 

recipe_spec <- recipe(y ~ ds, data = training(splits)) %>%
  step_date(ds, features = "month", ordinal = FALSE) %>%
  step_mutate(ds_num = as.numeric(ds)) %>%
  step_normalize(ds_num) %>%
  step_rm(ds)

wflw_fit_mars <- workflow() %>%
  add_recipe(recipe_spec) %>%
  add_model(model_spec_mars) %>%
  fit(training(splits))

# Model Spec
model_spec <- prophet_boost(
  learn_rate = 0.1
) %>%
  set_engine("prophet_xgboost")

# Fit Spec
if (TRUE) {
  model_fit <- model_spec %>%
    fit(log(y) ~ ds + as.numeric(ds) + month(ds, label = TRUE),
        data = training(splits))
  model_fit
}
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## parsnip model object
## 
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
##  - growth: 'linear'
##  - n.changepoints: 25
##  - changepoint.range: 0.8
##  - yearly.seasonality: 'auto'
##  - weekly.seasonality: 'auto'
##  - daily.seasonality: 'auto'
##  - seasonality.mode: 'additive'
##  - changepoint.prior.scale: 0.05
##  - seasonality.prior.scale: 10
##  - holidays.prior.scale: 10
##  - logistic_cap: NULL
##  - logistic_floor: NULL
## 
## ---
## Model 2: XGBoost Errors
## 
## xgboost::xgb.train(params = list(eta = 0.1, max_depth = 6, gamma = 0, 
##     colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1, 
##     subsample = 1, objective = "reg:squarederror"), data = x$data, 
##     nrounds = 15, watchlist = x$watchlist, verbose = 0, nthread = 1)
# Model Spec
model_spec <- seasonal_reg() %>%
  set_engine("stlm_ets")

# Fit Spec
model_fit_ses <- model_spec %>%
  fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
model_spec <- seasonal_reg() %>%
  set_engine("stlm_arima")

# Fit Spec
model_fit_sta <- model_spec %>%
  fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
#> frequency = 48 observations per 1 day
model_fit
## parsnip model object
## 
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
##  - growth: 'linear'
##  - n.changepoints: 25
##  - changepoint.range: 0.8
##  - yearly.seasonality: 'auto'
##  - weekly.seasonality: 'auto'
##  - daily.seasonality: 'auto'
##  - seasonality.mode: 'additive'
##  - changepoint.prior.scale: 0.05
##  - seasonality.prior.scale: 10
##  - holidays.prior.scale: 10
##  - logistic_cap: NULL
##  - logistic_floor: NULL
## 
## ---
## Model 2: XGBoost Errors
## 
## xgboost::xgb.train(params = list(eta = 0.1, max_depth = 6, gamma = 0, 
##     colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1, 
##     subsample = 1, objective = "reg:squarederror"), data = x$data, 
##     nrounds = 15, watchlist = x$watchlist, verbose = 0, nthread = 1)
model_spec <- prophet_reg() %>%
  set_engine("prophet")

# Fit Spec
model_fit_p <- model_spec %>%
  fit(log(y) ~ ds, data = training(splits))
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
models_tbl <- modeltime_table(
  model_fit_arima_boosted,
  wflw_fit_mars,
  workflow_fit_rf,
  model_fit,
  model_fit_ses,
  model_fit_sta,
  model_fit_p)

models_tbl
## # Modeltime Table
## # A tibble: 7 × 3
##   .model_id .model     .model_desc                              
##       <int> <list>     <chr>                                    
## 1         1 <fit[+]>   ARIMA(1,0,0)(1,1,0)[12] W/ XGBOOST ERRORS
## 2         2 <workflow> EARTH                                    
## 3         3 <workflow> RANDOMFOREST                             
## 4         4 <fit[+]>   PROPHET W/ XGBOOST ERRORS                
## 5         5 <fit[+]>   SEASONAL DECOMP: ETS(A,N,N)              
## 6         6 <fit[+]>   SEASONAL DECOMP: ARIMA(0,1,0)            
## 7         7 <fit[+]>   PROPHET
calibration_table <- models_tbl %>%
  modeltime_calibrate(testing(splits))

calibration_table %>%
  modeltime_accuracy() %>%
  table_modeltime_accuracy(.interactive = FALSE)
Accuracy Table
.model_id .model_desc .type mae mape mase smape rmse rsq
1 ARIMA(1,0,0)(1,1,0)[12] W/ XGBOOST ERRORS Test 38137.72 29.86 1.67 36.84 41839.34 0.70
2 EARTH Test 27460.44 24.25 1.20 19.84 34975.66 0.52
3 RANDOMFOREST Test 17449.68 15.24 0.76 13.50 22710.56 0.45
4 PROPHET W/ XGBOOST ERRORS Test 0.10 0.82 0.51 0.82 0.11 0.61
5 SEASONAL DECOMP: ETS(A,N,N) Test 0.11 0.91 0.57 0.91 0.11 0.65
6 SEASONAL DECOMP: ARIMA(0,1,0) Test 0.11 0.91 0.57 0.91 0.11 0.65
7 PROPHET Test 0.11 0.95 0.60 0.96 0.14 0.61
calibration_table %>%
  modeltime_forecast(actual_data = r_mass) %>%
  plot_modeltime_forecast(.interactive = TRUE)
## Using '.calibration_data' to forecast.
refit_tbl <- calibration_table %>%
  modeltime_refit(data = r_mass)
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## frequency = 12 observations per 1 year
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# forecast 36 months
#(Removing the first 3 models from the display shows a more detailed prediction with less error)
refit_tbl %>%
  modeltime_forecast(h = "36 months", actual_data = r_mass) %>%
  filter(.model_desc != 'ACTUAL') %>%
  plot_modeltime_forecast(
    .legend_max_width = 25, # For mobile screens
    .interactive      = TRUE
  )
# author:Cai
# food predict
food$ds <- month
food$y  <-food$weight_sum



r_food <-data.frame(food['ds'],food['y'])

# Data visualisation
r_food %>%
  plot_time_series(ds,y)
# Split Data 80/20
splits <- initial_time_split(r_food, prop = 0.9)

recipe_spec <- recipe(y ~ ds, training(splits)) %>%
  step_timeseries_signature(ds) %>%
  step_fourier(ds, period = 91.25, K = 1) %>%
  step_dummy(all_nominal())

recipe_spec %>% prep() %>% juice()
## # A tibble: 54 × 46
##    ds                 y ds_index.num ds_year ds_year.iso ds_half ds_quarter
##    <date>         <dbl>        <dbl>   <int>       <int>   <int>      <int>
##  1 2004-07-01 21754203.   1088640000    2004        2004       2          3
##  2 2004-08-01 21107094.   1091318400    2004        2004       2          3
##  3 2004-09-01 22477486.   1093996800    2004        2004       2          3
##  4 2004-10-01 26811262.   1096588800    2004        2004       2          4
##  5 2004-11-01 49708883.   1099267200    2004        2004       2          4
##  6 2004-12-01 37671981.   1101859200    2004        2004       2          4
##  7 2005-01-01 19218996.   1104537600    2005        2004       1          1
##  8 2005-02-01 17898041.   1107216000    2005        2005       1          1
##  9 2005-03-01 21150720.   1109635200    2005        2005       1          1
## 10 2005-04-01 19079235.   1112313600    2005        2005       1          2
## # … with 44 more rows, and 39 more variables: ds_month <int>,
## #   ds_month.xts <int>, ds_day <int>, ds_hour <int>, ds_minute <int>,
## #   ds_second <int>, ds_hour12 <int>, ds_am.pm <int>, ds_wday <int>,
## #   ds_wday.xts <int>, ds_mday <int>, ds_qday <int>, ds_yday <int>,
## #   ds_mweek <int>, ds_week <int>, ds_week.iso <int>, ds_week2 <int>,
## #   ds_week3 <int>, ds_week4 <int>, ds_mday7 <int>, ds_sin91.25_K1 <dbl>,
## #   ds_cos91.25_K1 <dbl>, ds_month.lbl_01 <dbl>, ds_month.lbl_02 <dbl>, …
# arima_boost

model_fit_arima_boosted <- arima_boost(
  min_n = 2,
  learn_rate = 0.000015
) %>%
  set_engine(engine = "auto_arima_xgboost") %>%
  fit(y ~ ds + as.numeric(ds) + factor(month(ds, label = TRUE), ordered = F),
      data = training(splits))
## frequency = 12 observations per 1 year
# random forest
model_spec_rf <- rand_forest(trees = 1000, min_n = 50) %>%
  set_engine("randomForest")

workflow_fit_rf <- workflow() %>%
  add_model(model_spec_rf) %>%
  add_recipe(recipe_spec %>% step_rm(ds)) %>%
  fit(training(splits))
# mars
model_spec_mars <- mars(mode = "regression") %>%
  set_engine("earth") 

recipe_spec <- recipe(y ~ ds, data = training(splits)) %>%
  step_date(ds, features = "month", ordinal = FALSE) %>%
  step_mutate(ds_num = as.numeric(ds)) %>%
  step_normalize(ds_num) %>%
  step_rm(ds)

wflw_fit_mars <- workflow() %>%
  add_recipe(recipe_spec) %>%
  add_model(model_spec_mars) %>%
  fit(training(splits))

# Model Spec
model_fit_pro_boost <- prophet_boost(
  learn_rate = 0.1
) %>%
  set_engine("prophet_xgboost")

# Fit Spec
if (TRUE) {
  model_fit <- model_fit_pro_boost %>%
    fit(log(y) ~ ds + as.numeric(ds) + month(ds, label = TRUE),
        data = training(splits))
  model_fit
}
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## parsnip model object
## 
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
##  - growth: 'linear'
##  - n.changepoints: 25
##  - changepoint.range: 0.8
##  - yearly.seasonality: 'auto'
##  - weekly.seasonality: 'auto'
##  - daily.seasonality: 'auto'
##  - seasonality.mode: 'additive'
##  - changepoint.prior.scale: 0.05
##  - seasonality.prior.scale: 10
##  - holidays.prior.scale: 10
##  - logistic_cap: NULL
##  - logistic_floor: NULL
## 
## ---
## Model 2: XGBoost Errors
## 
## xgboost::xgb.train(params = list(eta = 0.1, max_depth = 6, gamma = 0, 
##     colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1, 
##     subsample = 1, objective = "reg:squarederror"), data = x$data, 
##     nrounds = 15, watchlist = x$watchlist, verbose = 0, nthread = 1)
# Model Spec
model_spec <- seasonal_reg() %>%
  set_engine("stlm_ets")

# Fit Spec
model_fit_ses <- model_spec %>%
  fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
model_spec <- seasonal_reg() %>%
  set_engine("stlm_arima")

# Fit Spec
model_fit_sta <- model_spec %>%
  fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
#> frequency = 48 observations per 1 day

model_spec <- prophet_reg() %>%
  set_engine("prophet")

# Fit Spec
model_fit_p <- model_spec %>%
  fit(log(y) ~ ds, data = training(splits))
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
models_tbl <- modeltime_table(
  model_fit_arima_boosted,
  wflw_fit_mars,
  workflow_fit_rf,
  model_fit,
  model_fit_ses,
  model_fit_sta,
  model_fit_p
)

models_tbl
## # Modeltime Table
## # A tibble: 7 × 3
##   .model_id .model     .model_desc                              
##       <int> <list>     <chr>                                    
## 1         1 <fit[+]>   ARIMA(2,0,0)(0,1,0)[12] W/ XGBOOST ERRORS
## 2         2 <workflow> EARTH                                    
## 3         3 <workflow> RANDOMFOREST                             
## 4         4 <fit[+]>   PROPHET W/ XGBOOST ERRORS                
## 5         5 <fit[+]>   SEASONAL DECOMP: ETS(A,N,N)              
## 6         6 <fit[+]>   SEASONAL DECOMP: ARIMA(0,1,3)            
## 7         7 <fit[+]>   PROPHET
calibration_table <- models_tbl %>%
  modeltime_calibrate(testing(splits))

calibration_table %>%
  modeltime_accuracy() %>%
  table_modeltime_accuracy(.interactive = TRUE)
calibration_table %>%
  modeltime_forecast(actual_data = r_food) %>%
  plot_modeltime_forecast(.interactive = TRUE)
## Using '.calibration_data' to forecast.
refit_tbl <- calibration_table %>%
  modeltime_refit(data = r_food)
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## frequency = 12 observations per 1 year
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#(Removing the first 3 models from the display shows a more detailed prediction with less error)
refit_tbl %>%
  modeltime_forecast(h = "36 months", actual_data = r_food) %>%
  filter(.model_desc != 'ACTUAL') %>%
  plot_modeltime_forecast(
    .legend_max_width = 25, # For mobile screens
    .interactive      = TRUE
  )
# author:Cai
# drug predict
drug$ds <- month
drug$y  <-drug$weight_sum



r_drug <-data.frame(drug['ds'],drug['y'])

# Data visualisation
r_drug %>%
  plot_time_series(ds,y)
# Split Data 80/20
splits <- initial_time_split(r_drug, prop = 0.9)

recipe_spec <- recipe(y ~ ds, training(splits)) %>%
  step_timeseries_signature(ds) %>%
  step_fourier(ds, period = 365, K = 8) %>%
  step_dummy(all_nominal())

recipe_spec %>% prep() %>% juice()
## # A tibble: 54 × 60
##    ds               y ds_index.num ds_year ds_year.iso ds_half ds_quarter
##    <date>       <dbl>        <dbl>   <int>       <int>   <int>      <int>
##  1 2004-07-01 313263.   1088640000    2004        2004       2          3
##  2 2004-08-01 310946.   1091318400    2004        2004       2          3
##  3 2004-09-01 314866.   1093996800    2004        2004       2          3
##  4 2004-10-01 351334.   1096588800    2004        2004       2          4
##  5 2004-11-01 460019.   1099267200    2004        2004       2          4
##  6 2004-12-01 407903.   1101859200    2004        2004       2          4
##  7 2005-01-01 338459.   1104537600    2005        2004       1          1
##  8 2005-02-01 345270.   1107216000    2005        2005       1          1
##  9 2005-03-01 342112.   1109635200    2005        2005       1          1
## 10 2005-04-01 320271.   1112313600    2005        2005       1          2
## # … with 44 more rows, and 53 more variables: ds_month <int>,
## #   ds_month.xts <int>, ds_day <int>, ds_hour <int>, ds_minute <int>,
## #   ds_second <int>, ds_hour12 <int>, ds_am.pm <int>, ds_wday <int>,
## #   ds_wday.xts <int>, ds_mday <int>, ds_qday <int>, ds_yday <int>,
## #   ds_mweek <int>, ds_week <int>, ds_week.iso <int>, ds_week2 <int>,
## #   ds_week3 <int>, ds_week4 <int>, ds_mday7 <int>, ds_sin365_K1 <dbl>,
## #   ds_cos365_K1 <dbl>, ds_sin365_K2 <dbl>, ds_cos365_K2 <dbl>, …
# arima_boost

model_fit_arima_boosted <- arima_boost(
  min_n = 2,
  learn_rate = 0.00015
) %>%
  set_engine(engine = "auto_arima_xgboost") %>%
  fit(y ~ ds + as.numeric(ds) + factor(month(ds, label = TRUE), ordered = F),
      data = training(splits))
## frequency = 12 observations per 1 year
# random forest
model_spec_rf <- rand_forest(trees = 1000, min_n = 50) %>%
  set_engine("randomForest")

workflow_fit_rf <- workflow() %>%
  add_model(model_spec_rf) %>%
  add_recipe(recipe_spec %>% step_rm(ds)) %>%
  fit(training(splits))
# mars
model_spec_mars <- mars(mode = "regression") %>%
  set_engine("earth") 

recipe_spec <- recipe(y ~ ds, data = training(splits)) %>%
  step_date(ds, features = "month", ordinal = FALSE) %>%
  step_mutate(ds_num = as.numeric(ds)) %>%
  step_normalize(ds_num) %>%
  step_rm(ds)

wflw_fit_mars <- workflow() %>%
  add_recipe(recipe_spec) %>%
  add_model(model_spec_mars) %>%
  fit(training(splits))

# Model Spec
model_fit_pro_boost <- prophet_boost(
  learn_rate = 0.1
) %>%
  set_engine("prophet_xgboost")

# Fit Spec
if (TRUE) {
  model_fit <- model_fit_pro_boost %>%
    fit(log(y) ~ ds + as.numeric(ds) + month(ds, label = TRUE),
        data = training(splits))
  model_fit
}
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## parsnip model object
## 
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
##  - growth: 'linear'
##  - n.changepoints: 25
##  - changepoint.range: 0.8
##  - yearly.seasonality: 'auto'
##  - weekly.seasonality: 'auto'
##  - daily.seasonality: 'auto'
##  - seasonality.mode: 'additive'
##  - changepoint.prior.scale: 0.05
##  - seasonality.prior.scale: 10
##  - holidays.prior.scale: 10
##  - logistic_cap: NULL
##  - logistic_floor: NULL
## 
## ---
## Model 2: XGBoost Errors
## 
## xgboost::xgb.train(params = list(eta = 0.1, max_depth = 6, gamma = 0, 
##     colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1, 
##     subsample = 1, objective = "reg:squarederror"), data = x$data, 
##     nrounds = 15, watchlist = x$watchlist, verbose = 0, nthread = 1)
# Fit Spec
model_fit_ses <- model_spec %>%
  fit(log(y) ~ ds, data = training(splits))
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_spec <- seasonal_reg() %>%
  set_engine("stlm_arima")

# Fit Spec
model_fit_sta <- model_spec %>%
  fit(log(y) ~ ds, data = training(splits))
## frequency = 12 observations per 1 year
#> frequency = 48 observations per 1 day

model_spec <- prophet_reg() %>%
  set_engine("prophet")

# Fit Spec
model_fit_p <- model_spec %>%
  fit(log(y) ~ ds, data = training(splits))
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
models_tbl <- modeltime_table(
  model_fit_arima_boosted,
  wflw_fit_mars,
  workflow_fit_rf,
  model_fit,
  model_fit_ses,
  model_fit_sta,
  model_fit_p
)

models_tbl
## # Modeltime Table
## # A tibble: 7 × 3
##   .model_id .model     .model_desc                              
##       <int> <list>     <chr>                                    
## 1         1 <fit[+]>   ARIMA(1,1,0)(0,1,0)[12] W/ XGBOOST ERRORS
## 2         2 <workflow> EARTH                                    
## 3         3 <workflow> RANDOMFOREST                             
## 4         4 <fit[+]>   PROPHET W/ XGBOOST ERRORS                
## 5         5 <fit[+]>   PROPHET                                  
## 6         6 <fit[+]>   SEASONAL DECOMP: ARIMA(0,1,0)            
## 7         7 <fit[+]>   PROPHET
calibration_table <- models_tbl %>%
  modeltime_calibrate(testing(splits))

calibration_table %>%
  modeltime_accuracy() %>%
  table_modeltime_accuracy(.interactive = TRUE)
calibration_table %>%
  modeltime_forecast(actual_data = r_drug) %>%
  plot_modeltime_forecast(.interactive = TRUE)
## Using '.calibration_data' to forecast.
refit_tbl <- calibration_table %>%
  modeltime_refit(data = r_drug)
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## frequency = 12 observations per 1 year
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#(Removing the first 3 models from the display shows a more detailed prediction with less error)
refit_tbl %>%
  modeltime_forecast(h = "36 months", actual_data = r_drug) %>%
  filter(.model_desc != 'ACTUAL') %>%
  plot_modeltime_forecast(
    .legend_max_width = 25, # For mobile screens
    .interactive      = TRUE

  )

results

On the traditional model:

Residual test, significant: residuals are not smooth p-value greater than 0.05 ,so the data are not suitable to use arima model.

In particular, we used time regression to predict trends in the annual data. It is clear that the three levels of supermarket sales show an upward trend.

The Holt-Winters seasonality approach consists of a forecasting equation and three smoothing equations and it is clear that the model has identified monthly and quarterly seasonal patterns and growth trends at the end of the data and that the forecasts match the test data.However, HW did not have the same fit results for the two seasonal models on some data.

For each month’s forecast, we used the STL and ETS models to forecast the seasonal follow, trends. It is clear to see that there is an upward trend for the next three years of the cycle.

On the integrated model:auto_arima_xgboost,randomForest,earth,prophet_xgboost,stlm_ets,stlm_arima,prophet.(modeltime combines time series data well with machine learning models. )

prophet has the advantage of being able to calculate the variation points of the first 80 percent of the historical data, from which future cycles can be predicted, and also has the benefit of calculating trends,The algorithm will automatically calculate the change points. And XGBoost has the good effect of training residuals. However, the rmse of arima is particularly large because this data is not applicable to the arima model, but the residuals of xgboost training prophet converge with good results.

RANDOMFOREST has the advantage of dealing with non-linear regression problems, but here it seems that rmse does not converge.

EARTH is a segmented regression.Again, the results do not apply here.

The good performers are prophet_xgboost,stlm_ets,stlm_arima,prophet. stlm_ets,stlm_arima are seasonal models and the difference between prophet_xgboost and prophet is that prophet_xgboost is trained with xgboost to train the residuals. Because the model is logistic and easy to calculate, the predicted values are small, but the trend and season can be predicted more accurately. If you remove the first three model lines, you can see the details of the other four models.

The XGBoost component has specified parameters. We can get better accuracy by tuning, but as the prophet component works well on this data, the additional improvement is likely to be low.

As there are only months and years in modeltime, there are no forecast quarters on the novel model.

The models all capture the uptrend. However, the novel model is more accurate and detailed than the time regression model.

Conclusions

Different models should be used to fulfil different forecasting needs on different data intervals (or different amounts of data, less so for quarters versus years).

For example, we use time regression and machine learning algorithms to forecast yearly trends, but the machine learning algorithms are different from the traditional algorithm framework, for example, in the modeltime package, the imported data scale is by month, so for 04 and 09 missing half-year data, it is more flexible for forecasting yearly trends. For example, it will help us calculate the trend to 06 December, instead of predicting the trend from 2010 to December 2012, the data processing interval is more flexible.

Exponential smoothing and hw have excellent performance in seasonal forecasting. But by combining traditional models with machine learning algorithms, the results will be even better. Among the novel models for dealing with time series problems, there are not only machine learning models, but more commonly deep learning models, most of which are related to deep learning during the literature search. However, due to the limited preparation time given for the exam, it takes more than a month to train deep learning, so instead of choosing a deep learning model for this novel model, we chose, instead, the faster training machine We did not choose a deep learning model this time, but a faster training machine learning model. Although there were some limitations, we were able to complete all the tasks.